Search CORE

119 research outputs found

Location Privacy in Spatial Crowdsourcing

Author: A Liu
A Pham
Ashwin Machanavajjhala
Bo Zhang
C-Y Chow
Delphine Christin
Hien To
LATANYA SWEENEY
Lefeng Zhang
M Shin
Publication venue
Publication date: 22/04/2017
Field of study

Spatial crowdsourcing (SC) is a new platform that engages individuals in collecting and analyzing environmental, social and other spatiotemporal information. With SC, requesters outsource their spatiotemporal tasks to a set of workers, who will perform the tasks by physically traveling to the tasks' locations. This chapter identifies privacy threats toward both workers and requesters during the two main phases of spatial crowdsourcing, tasking and reporting. Tasking is the process of identifying which tasks should be assigned to which workers. This process is handled by a spatial crowdsourcing server (SC-server). The latter phase is reporting, in which workers travel to the tasks' locations, complete the tasks and upload their reports to the SC-server. The challenge is to enable effective and efficient tasking as well as reporting in SC without disclosing the actual locations of workers (at least until they agree to perform a task) and the tasks themselves (at least to workers who are not assigned to those tasks). This chapter aims to provide an overview of the state-of-the-art in protecting users' location privacy in spatial crowdsourcing. We provide a comparative study of a diverse set of solutions in terms of task publishing modes (push vs. pull), problem focuses (tasking and reporting), threats (server, requester and worker), and underlying technical approaches (from pseudonymity, cloaking, and perturbation to exchange-based and encryption-based techniques). The strengths and drawbacks of the techniques are highlighted, leading to a discussion of open problems and future work

arXiv.org e-Print Archive

Crossref

Cor-Split: Defending Privacy in Data Re-Publication from Historical Correlations and Compromised Tuples

Author: A. Machanavajjhala
B.C.M. Fung
J. Pei
J.W. Byun
K. LeFevre
N. Li
P. Samarati
X. Xiao
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. Several approaches have been proposed for privacy preserving data publication. In this paper we consider the important case in which a certain view over a dynamic dataset has to be released a number of times during its history. The insufficiency of techniques used for one-shot publication in the case of subsequent releases has been previously recognized, and some new approaches have been proposed. Our research shows that relevant privacy threats, not recognized by previous proposals, can occur in practice. In particular, we show the cascading effects that a single (or a few) compromised tuples can have in data re-publication when coupled with the ability of an adversary to recognize historical correlations among released tuples. A theoretical study of the threats leads us to a defense algorithm, implemented as a significant extension of the m-invariance technique. Extensive experiments using publicly available datasets show that the proposed technique preserves the utility of published data and effectively protects from the identified privacy threats.

CiteSeerX

Crossref

AIR Universita degli studi di Milano

Archivio istituzionale della ricerca - Università di Cagliari

PARTS – Privacy-Aware Routing with Transportation Subgraphs

Author: A Machanavajjhala
ADI Kramer
L Sweeney
P Golle
T Wang
Y Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2017
Field of study

To ensure privacy for route planning applications and other location based services (LBS), the service provider must be prevented from tracking a user’s path during navigation on the application level. However, the navigation functionality must be preserved. We introduce the algorithm PARTS to split route requests into route parts which will be submitted to an LBS in an unlinkable way. Equipped with the usage of dummy requests and time shifting, our approach can achieve better privacy. We will show that our algorithm protects privacy in the presence of a realistic adversary model while maintaining the service quality

University of Regensburg Publication Server

Crossref

General and specific utility measures for synthetic data

Author: Breiman L.
Drechsler J.
Kinney S. K.
Machanavajjhala A.
McClure D.
Miranda J.
Nowok B.
R Core Team
Raghunathan T. E.
Reiter J. P.
Reiter J. P.
Reiter J. P.
Wilson M.
Woo M.‐J.
Woo Y. M. J.
Publication venue: 'Wiley'
Publication date: 18/06/2017
Field of study

Data holders can produce synthetic versions of datasets when concerns about potential disclosure restrict the availability of the original records. This paper is concerned with methods to judge whether such synthetic data have a distribution that is comparable to that of the original data, what we will term general utility. We consider how general utility compares with specific utility, the similarity of results of analyses from the synthetic data and the original data. We adapt a previous general measure of data utility, the propensity score mean-squared-error (pMSE), to the specific case of synthetic data and derive its distribution for the case when the correct synthesis model is used to create the synthetic data. Our asymptotic results are confirmed by a simulation study. We also consider two specific utility measures, confidence interval overlap and standardized difference in summary statistics, which we compare with the general utility results. We present two examples examining this comparison of general and specific utility to real data syntheses and make recommendations for their use for evaluating synthetic data

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Routes for breaching and protecting genetic privacy

Author: A Acquisti
A Cavoukian
A Kong
A Machanavajjhala
A Narayanan
AD Johnson
AJ Pakstis
AK Manning
AL McGuire
Arvind Narayanan
B Fons
B Malin
B Malin
BA Malin
BM Henn
C Dwork
C Shannon
CD Huff
D Clayton
D He
D Zubakov
DJ Solve
DR Nyholt
DW Craig
EA Zerhouni
EE Schadt
EM Ramos
F Liu
G Church
H Lango Allen
H Li
HK Im
HS Venter
J Burn
J Gitschier
J Kaiser
J Kaye
J Kaye
J Lee
J Marchini
JE Lunshof
JH Park
JM Oliver
JP Roberts
K Benitez
K El Emam
K El Emam
K Silventoinen
KA Tryka
KB Jacobs
KS Kendler
L Kamm
L Sweeney
L Sweeney
LA Sweeney
LA Sweeney
LAP Kohn
LL Rodriguez
M Canim
M Gymrek
M Gymrek
M Kantarcioglu
M Kayser
MD Mailman
N Chatterjee
N Homer
NN Taleb
P Bohannon
P Kwok
P Ohm
P Paillier
PM Visscher
R Braun
R Drmanac
R Khan
R Noumeir
RL Bennett
S Byers
S McClure
S Sankararaman
S Walsh
SE Brenner
SF Terry
SH Friend
T Lumley
TE King
TE King
V Bafna
W Fu
W Hartzog
WG Hill
WW Lowrance
XL Ou
Yaniv Erlich
Z Lin
Publication venue
Publication date: 01/12/2013
Field of study

We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

Optimal constraint-based decision tree induction from itemset lattices

Author: A Lew
A Machanavajjhala
A Moore
B Ganter
C Bucila
C Nadeau
Elisa Fromont
F Bonchi
G Blanchard
H Schumacher
HA Chipman
HJ Payne
IH Witten
J-F Boulicaut
JR Quinlan
L Breiman
L Hyafil
L Sweeney
MJ Zaki
MN Garofalakis
MR Garey
N Pasquier
P Samarati
P Turney
S Esmeir
Siegfried Nijssen
T Imielinski
W Buntine
Publication venue: Springer
Publication date: 01/01/2010
Field of study

International audienceIn this article we show that there is a strong connection between decision tree learning and local pattern mining. This connection allows us to solve the computationally hard problem of finding optimal decision trees in a wide range of applications by post-processing a set of patterns: we use local patterns to construct a global model. We exploit the connection between constraints in pattern mining and constraints in decision tree induction to develop a framework for categorizing decision tree mining constraints. This framework allows us to determine which model constraints can be pushed deeply into the pattern mining process, and allows us to improve the state-of-the-art of optimal decision tree induction

An Algebraic Theory for Data Linkage

Author: A Machanavajjhala
B Jacobs
BA Davey
CA Gunter
DJ Foulis
DS Scott
EF Codd
F Wehrung
J Attard
J Kohlas
J Kohlas
J Kohlas
JY Halpern
L Sweeney
L Sweeney
M Pouly
M Shulman
ND Belnap
O Arieli
P Schultz
PP Shenoy
R Haenni
T Fritz
V Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/10/2018
Field of study

There are countless sources of data available to governments, companies, and citizens, which can be combined for good or evil. We analyse the concepts of combining data from common sources and linking data from different sources. We model the data and its information content to be found in a single source by an ordered partial monoid, and the transfer of information between sources by different types of morphisms. To capture the linkage between a family of sources, we use a form of Grothendieck construction to create an ordered partial monoid that brings together the global data of the family in a single structure. We apply our approach to database theory and axiomatic structures in approximate reasoning. Thus, ordered partial monoids provide a foundation for the algebraic study for information gathering in its most primitive form

arXiv.org e-Print Archive

Crossref

Cronfa at Swansea University

On scaling up sensitive data auditing

Author: Agrawal R.
Amsterdamer Yael
Bhagwat D.
Geerts F.
Glavic B.
Glavic B.
Glavic B.
Green Todd J.
Kaushik R.
Lampson Butler
Machanavajjhala A.
Miklau G.
Motwani R.
Sarma A. Das
Seshadri P.
Suciu D.
Weitzner D. J.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

De-identifying a public use microdata file from the Canadian national discharge abstract database

Author: A Dale
A de Waal
A Gionis
A Hundepool
A Hundepool
A Machanavajjhala
A Machanavajjhala
A Meyerson
A Narayanan
Agency for Healthcare Research and Quality
B Hore
B Yolles
B-C Chen
BCM Fung
BCM Fung
BCM Fung
C Hogue
C Mackie
C Marsh
C Marsh
C Skinner
C Skinner
Canada Statistics
Canadian Institute for Health Information
Canadian Institute for Health Information
CE Shannon
CE Shannon
CK Liew
D Altman
D Defays
D Defays
D Hutchon
D Lafky
David Paton
DB Rubin
Department of Health and Human Services
Department of Health and Human Services
E Boyko
Federal Court (Canada)
Fida Dankar
G Aggarwal
G Duncan
G Loukides
G Sande
G Sullivan
G Sullivan
GD Smith
GR Heer
Gunes Koru
H Kargupta
J Castro
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Jimenez
J Schoenman
J Xu
JJ Kim
JP Gouweleeuw
K Abraham
K Benitez
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K LeFevre
Khaled El Emam
L Alexander
L Sweeney
L Sweeney
L Sweeney
L Sweeney
L Sweeney
L Willenborg
L Willenborg
LA Alexander
LH Cox
M Barbaro
M Templ
ME Nergiz
National Committee on Vital and Health Statistics
P Doyle
P Kooiman
P Nanopoulos
P Samarati
P Samarati
P Samarati
R Bayardo
R Gopal
RA Dandekar
RA Dandekar
RJ Bayardo
RJA Little
S Fienberg
S Hansell
S Ochoa
Statistics Canada
Statistics Canada
Statistics Canada
T de Waal
T Delamothe
T Hedrick
T Zeller Jr
V Ciriani
V Iyengar
V Torra
V Torra
V Torra
VS Iyengar
W Lowrance
W Winkler
WE Winkler
X Xiao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records. Methods Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy. Results Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression. Conclusions The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Privacy-Preserving Release of Spatio-temporal Density

Author: A E Cicek
A Monreale
A Rajaraman
Ashwin Machanavajjhala
Benjamin C. M. Fung
C Li
Cynthia Dwork
Cynthia Dwork
G Kellaris
G Poulis
Georgios Kellaris
I Goodfellow
LATANYA SWEENEY
Linus Bengtsson
M E Nergiz
Manolis Terrovitis
Marta C. González
N Victor
P Neirotti
R Kitchin
Shubha U. Nabar
W Qardaji
X He
Y Xiao
Publication venue: Springer International Publishing
Publication date: 01/01/2018
Field of study

International audienceIn today’s digital society, increasing amounts of contextually rich spatio-temporal information are collected and used, e.g., for knowledge-based decision making, research purposes, optimizing operational phases of city management, planning infrastructure networks, or developing timetables for public transportation with an increasingly autonomous vehicle fleet. At the same time, however, publishing or sharing spatio-temporal data, even in aggregated form, is not always viable owing to the danger of violating individuals’ privacy, along with the related legal and ethical repercussions. In this chapter, we review some fundamental approaches for anonymizing and releasing spatio-temporal density, i.e., the number of individuals visiting a given set of locations as a function of time. These approaches follow different privacy models providing different privacy guarantees as well as accuracy of the released anonymized data. We demonstrate some sanitization (anonymization) techniques with provable privacy guarantees by releasing the spatio-temporal density of Paris, in France. We conclude that, in order to achieve meaningful accuracy, the sanitization process has to be carefully customized to the application and public characteristics of the spatio-temporal data

Crossref

INRIA a CCSD electronic archive server

Repository of the Academy's Library